{
  "cells": [
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "## 5.7 \u8bfb\u5199\u538b\u7f29\u6587\u4ef6\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### \u95ee\u9898\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u4f60\u60f3\u8bfb\u5199\u4e00\u4e2agzip\u6216bz2\u683c\u5f0f\u7684\u538b\u7f29\u6587\u4ef6\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### \u89e3\u51b3\u65b9\u6848\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "gzip \u548c bz2 \u6a21\u5757\u53ef\u4ee5\u5f88\u5bb9\u6613\u7684\u5904\u7406\u8fd9\u4e9b\u6587\u4ef6\u3002\n\u4e24\u4e2a\u6a21\u5757\u90fd\u4e3a open() \u51fd\u6570\u63d0\u4f9b\u4e86\u53e6\u5916\u7684\u5b9e\u73b0\u6765\u89e3\u51b3\u8fd9\u4e2a\u95ee\u9898\u3002\n\u6bd4\u5982\uff0c\u4e3a\u4e86\u4ee5\u6587\u672c\u5f62\u5f0f\u8bfb\u53d6\u538b\u7f29\u6587\u4ef6\uff0c\u53ef\u4ee5\u8fd9\u6837\u505a\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# gzip compression\nimport gzip\nwith gzip.open('somefile.gz', 'rt') as f:\n    text = f.read()\n\n# bz2 compression\nimport bz2\nwith bz2.open('somefile.bz2', 'rt') as f:\n    text = f.read()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u7c7b\u4f3c\u7684\uff0c\u4e3a\u4e86\u5199\u5165\u538b\u7f29\u6570\u636e\uff0c\u53ef\u4ee5\u8fd9\u6837\u505a\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "# gzip compression\nimport gzip\nwith gzip.open('somefile.gz', 'wt') as f:\n    f.write(text)\n\n# bz2 compression\nimport bz2\nwith bz2.open('somefile.bz2', 'wt') as f:\n    f.write(text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u5982\u4e0a\uff0c\u6240\u6709\u7684I/O\u64cd\u4f5c\u90fd\u4f7f\u7528\u6587\u672c\u6a21\u5f0f\u5e76\u6267\u884cUnicode\u7684\u7f16\u7801/\u89e3\u7801\u3002\n\u7c7b\u4f3c\u7684\uff0c\u5982\u679c\u4f60\u60f3\u64cd\u4f5c\u4e8c\u8fdb\u5236\u6570\u636e\uff0c\u4f7f\u7528 rb \u6216\u8005 wb \u6587\u4ef6\u6a21\u5f0f\u5373\u53ef\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "### \u8ba8\u8bba\n"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u5927\u90e8\u5206\u60c5\u51b5\u4e0b\u8bfb\u5199\u538b\u7f29\u6570\u636e\u90fd\u662f\u5f88\u7b80\u5355\u7684\u3002\u4f46\u662f\u8981\u6ce8\u610f\u7684\u662f\u9009\u62e9\u4e00\u4e2a\u6b63\u786e\u7684\u6587\u4ef6\u6a21\u5f0f\u662f\u975e\u5e38\u91cd\u8981\u7684\u3002\n\u5982\u679c\u4f60\u4e0d\u6307\u5b9a\u6a21\u5f0f\uff0c\u90a3\u4e48\u9ed8\u8ba4\u7684\u5c31\u662f\u4e8c\u8fdb\u5236\u6a21\u5f0f\uff0c\u5982\u679c\u8fd9\u65f6\u5019\u7a0b\u5e8f\u60f3\u8981\u63a5\u53d7\u7684\u662f\u6587\u672c\u6570\u636e\uff0c\u90a3\u4e48\u5c31\u4f1a\u51fa\u9519\u3002\ngzip.open() \u548c bz2.open() \u63a5\u53d7\u8ddf\u5185\u7f6e\u7684 open() \u51fd\u6570\u4e00\u6837\u7684\u53c2\u6570\uff0c\n\u5305\u62ec encoding\uff0cerrors\uff0cnewline \u7b49\u7b49\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u5f53\u5199\u5165\u538b\u7f29\u6570\u636e\u65f6\uff0c\u53ef\u4ee5\u4f7f\u7528 compresslevel \u8fd9\u4e2a\u53ef\u9009\u7684\u5173\u952e\u5b57\u53c2\u6570\u6765\u6307\u5b9a\u4e00\u4e2a\u538b\u7f29\u7ea7\u522b\u3002\u6bd4\u5982\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "with gzip.open('somefile.gz', 'wt', compresslevel=5) as f:\n    f.write(text)"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u9ed8\u8ba4\u7684\u7b49\u7ea7\u662f9\uff0c\u4e5f\u662f\u6700\u9ad8\u7684\u538b\u7f29\u7b49\u7ea7\u3002\u7b49\u7ea7\u8d8a\u4f4e\u6027\u80fd\u8d8a\u597d\uff0c\u4f46\u662f\u6570\u636e\u538b\u7f29\u7a0b\u5ea6\u4e5f\u8d8a\u4f4e\u3002"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u6700\u540e\u4e00\u70b9\uff0c gzip.open() \u548c bz2.open() \u8fd8\u6709\u4e00\u4e2a\u5f88\u5c11\u88ab\u77e5\u9053\u7684\u7279\u6027\uff0c\n\u5b83\u4eec\u53ef\u4ee5\u4f5c\u7528\u5728\u4e00\u4e2a\u5df2\u5b58\u5728\u5e76\u4ee5\u4e8c\u8fdb\u5236\u6a21\u5f0f\u6253\u5f00\u7684\u6587\u4ef6\u4e0a\u3002\u6bd4\u5982\uff0c\u4e0b\u9762\u4ee3\u7801\u662f\u53ef\u884c\u7684\uff1a"
      ]
    },
    {
      "cell_type": "code",
      "execution_count": null,
      "metadata": {},
      "outputs": [],
      "source": [
        "import gzip\nf = open('somefile.gz', 'rb')\nwith gzip.open(f, 'rt') as g:\n    text = g.read()"
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {},
      "source": [
        "\u8fd9\u6837\u5c31\u5141\u8bb8 gzip \u548c bz2 \u6a21\u5757\u53ef\u4ee5\u5de5\u4f5c\u5728\u8bb8\u591a\u7c7b\u6587\u4ef6\u5bf9\u8c61\u4e0a\uff0c\u6bd4\u5982\u5957\u63a5\u5b57\uff0c\u7ba1\u9053\u548c\u5185\u5b58\u4e2d\u6587\u4ef6\u7b49\u3002"
      ]
    }
  ],
  "metadata": {
    "kernelspec": {
      "display_name": "Python 3",
      "language": "python",
      "name": "python3"
    },
    "language_info": {
      "codemirror_mode": {
        "name": "ipython",
        "version": 3
      },
      "file_extension": ".py",
      "mimetype": "text/x-python",
      "name": "python",
      "nbconvert_exporter": "python",
      "pygments_lexer": "ipython3",
      "version": "3.7.1"
    },
    "toc": {
      "base_numbering": 1,
      "nav_menu": {},
      "number_sections": true,
      "sideBar": true,
      "skip_h1_title": true,
      "title_cell": "Table of Contents",
      "title_sidebar": "Contents",
      "toc_cell": false,
      "toc_position": {},
      "toc_section_display": true,
      "toc_window_display": true
    }
  },
  "nbformat": 4,
  "nbformat_minor": 2
}